



CS2200
Systems and Networks
Spring 2022

Lecture 10: Pipelining

Alexandros (Alex) Daglis School of Computer Science Georgia Institute of Technology adaglis@gatech.edu

Lecture slides adapted from Bill Leahy and Charles Lively of Georgia Tech

### Going faster...







station I
(place order)

station II (select bread)

station III (cheese) station IV (meat)





station I
(place order)

station II (select bread)

station III (cheese)

station IV (meat)





station I (place order) station II
(select bread)

station III (cheese)

station IV (meat)





station I
(place order)

station II (select bread)

station III (cheese)

station IV (meat)





station I
(place order)

station II (select bread)

station III (cheese) station IV (meat)

station V
(veggies)







Net result: One complete sandwich every five cycles.

### Bill's Mega-Sandwich Shop











station I
(place order)

station II (select bread)

station III (cheese)

station IV (meat)

station V
(veggies)

"What will You Have?"















Net result: One complete sandwich every cycle! (Once you fill the pipeline.) A 5x speedup!















### One at a time...



### Three at a time?



# Overlapping execution



### What do we need in the datapath?



#### LC-2200 Datapath 32 LdPC -LdA LdB LdIR LdMAR PC MAR IR Addr IR[31..0] Din Din WrREG WrMEM registers memory 16x 1024x func ALU: IR[19..0] regno 00: ADD 32 bits 32 bits 01: NAND 20 10: A - B Dout Dout 11: A + 1 sign extend DrREG DrPC DrALU DrMEM DrOFF =0? IR[27..24] Rx: 4 -bit register number to control logic IR[23..20] Ry: 4 -bit register number to control logic LdZ IR[3..0] ▶ Rz: 4 -bit register number to control logic IR[31..28] ◆ OP: 4 -bit opcode to control logic

→ Z: 1 -bit boolean to control logic

### What units are used?

Macro State Units in Use

FETCH IR, ALU, PC, MEM

DECODE IR

EXECUTE (ADD) IR, ALU, Reg-file

EXECUTE (LW) IR, ALU, Reg-file, MEM, Sign extender



### Imitation: The Sincerest Form of Flattery



station I (place order) New (5<sup>th</sup> order)



station II (select bread) 4<sup>th</sup> order



station III (cheese) 3<sup>rd</sup> order



station IV (meat) 2<sup>nd</sup> order



station V (veggies) 1<sup>st</sup> order

#### Points to note:

- Work done by each station is roughly equal
- Every sandwich goes through every station, regardless of what you want or don't want
- Order form and partially assembled sandwich is <u>passed from one station to the next</u>
- Each station does part of the work for assembling a sandwich

### How do we mimic sandwich assembly?



- Unequal division of labor
- Violates first principle in sandwich assembly line

### Bill's Cost-Cutting Measure



station I (place order) New (5<sup>th</sup> order)





station II (select bread) 4<sup>th</sup> order





station III (cheese) 3<sup>rd</sup> order





station IV (meat) 2<sup>nd</sup> order





station V(veggies)1<sup>st</sup> order



### Bill's Cost-Cutting Measure







station II (select bread) 2nd order



station III (cheese, meat, veggies) 1st order



(place order)

3rd order





#### Disaster!

- Increased waiting times
- One sandwich every 3 cycles, not one every cycle!
- Two workers who only work about a third of the time



# Bill originally used 5 stages rather than 3 in his sandwich pipeline because...

- A. He copied the instruction pipeline
- B. He could only find 5 workers
- C. He is solving the unemployment problem
- D. He wants to make sure that the amount of work per stage is roughly the same
- E. He is a nice guy
- F. You are making us hungry...

### How do we mimic sandwich assembly?



- Get to the basics!
- What needs to happen for every instruction?
- Fetch into IR and increment PC
- Decode instruction and read register contents
- Perform arithmetic/logic (maybe)
  - Perform address computation (maybe)
- Fetch/store memory operand (maybe)
- Write to register (maybe)

Each step is roughly the same amount of work

### How can we know what registers to read?

- What does the "bread guy" care about?
- How does the "bread guy" know what bread?
- It's the order form!



- What's the equivalent of the order form in the instruction pipeline?
- |R

### LC-2200 Instruction set

bits 27-0:

unused

R-type instructions (add, nand): bits 31-28: opcode; reg X; bits 27-24: reg Y; bits 19-4: unused (should be all 0s); bits 23-20: bits 3-0: reg Z I-type instructions (addi, lw, sw, beq): bits 31-28: opcode; bits 27-24: reg X bits 19-0: Imm. Offset bits 23-20: reg Y; J-type instructions (jalr): opcode; bits 31-28: bits 27-24: reg X bits 23-20: reg Y; bits 19-0: unused O-type instructions (halt):

bits 31-28:

opcode;

### Every instruction goes through



- Some stages don't do anything for some instructions
- Each stage works on a different instruction

```
IF - fetch instruction into IR and increment PC
```

ID/RR - decode and read register contents

EX - perform arithmetic/logic (maybe)

- perform address computation (maybe)

MEM - fetch/store memory operand (maybe)

WB - write to register (maybe)

### What moves between stages?



- Think of our sandwich assembly line
- Each station passes two things to the next
  - The order form
  - A partially assembled sandwich
- Analogies with our data path
  - Is the IR equivalent to the order form?
  - What is a "partially assembled data sandwich"?

### Electrical isolation between stages



- Registers act as a wall to isolate the actions of each combinational logic cloud!
- On each clock tick, the registers send their output to drive the next stage of combinational logic

### What's in these pipeline boxes?



- How do we separate our pipeline boxes?
- The boxes are really just combinational logic clouds
- So we already know how to separate them...

### This is where buffers come from...



- There is combinational logic in each cloud for performing the actions of that stage
- The buffers are merely clocked registers for passing the partially assembled results

### Again: What does each stage do?



F - fetch instruction into IR and increment PC

ID/RR - read register contents

EX \_ - perform arithmetic/logic (maybe)

- perform address computation (maybe)

MEM - fetch/store memory operand (maybe)

WB - write to register (maybe)

### Again: What does each stage need?



Conclusion: we need a little datapath in each stage

### A shiny new data path!



STAGE Units in Use

FETCH ALU, PC, MEM

ID/RR Reg-file, decode logic

EXEC ALU

MEM MEM

WB Reg-file

### Two register files? Not.



We use a register file in two stages

# Do you remember this device?



### Two register files? Not.



- We use a register file in two stages, but conveniently
  - ID/RR uses read ports (A addr, B addr B, A output, B output)
  - WB uses write port (Dest addr, Input)
- So we can happily share the same register file on each clock cycle

# Example of sharing the register file



#### But what about those buffers?



- What goes in those buffers? Same? Different?
- The different stages seem to need different information
- What was that sandwich analogy?





# Comparing the instruction pipeline...

...to Bill's sandwich pipeline, the buffers between stages serve the same function as

- A. A partially assembled sandwich
- B. Order form passed between stations
- C. A combination of the above two
- D. No comparison
- E. Buffer? You mean buffet?
- F. You are making us hungry... again!



# So what makes up our partially assembled data sandwich?



- The IR (or parts of it) is the order form
- The data sandwich is the partial execution results
- Do the buffers have to all carry the same information? Or can they be different?

### What does each stage need in the buffers?



FBUF DBUF EBUF MBUF

|   | Vame | Output of Stage | Contents                                                                |  |
|---|------|-----------------|-------------------------------------------------------------------------|--|
| F | BUF  | F               | Primarily contains instruction read from memory                         |  |
|   | DBUF | ID/RR           | Decoded IR and values read from register file                           |  |
| E | BUF  | EX              | Primarily contains result of ALU operation plus other parts of the      |  |
|   |      |                 | instruction depending on the instruction specifics                      |  |
| 1 | 1BUF | MEM             | Same as EBUF if instruction is not LW or SW; If instruction is LW, then |  |
|   |      |                 | buffer contains the contents of memory location read                    |  |

#### Our non-pipelined LC-2200



At each point, the processor is in exactly one macro state



# What "macro state" is a pipelined processor in at any point in time?

- A. Fetch
- B. Decode
- C. Execute
- D. All of the above
- E. We need a new "sandwich" state

#### Anatomy of an ADD instruction

- IF stage (cycle I)
- ID/RR stage (cycle 2)
- EX stage (cycle 3)
- MEM stage (cycle 4)
- WB stage (cycle 5)

#### Actions for an ADD instruction

| Stage | Actions                                                                                               | Comments                                                                                                          |
|-------|-------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| IF    | MEM[PC] → FBUF.IR<br>PC + I → PC                                                                      | Instruction is fetched from memory and placed in FBUF (which is essentially the IR)                               |
| ID/RR | DPRF[FBUF.IR.Ry] → DBUF.A DPRF[FBUF.IR.Rz] → DBUF.B FBUF.IR.Opcode → DBUF.Opcode FBUF.IR.Rx → DBUF.Rx | Read register Ry into DBUF.A Read register Rz into DBUF.B Copy opcode from FBUF to DBUF Copy Rx from FBUF to DBUF |
| EX    | DBUF.A + DBUF.B → EBUF.Result DBUF.OPCODE → EBUF.Opcode DBUF.Rx → EBUF.Rx                             | Perform addition Copy opcode from DBUF to EBUF Copy Rx from DBUF to EBUF                                          |
| MEM   | EBUF → MBUF                                                                                           | Copy EBUF to MBUF                                                                                                 |
| WB    | MBUF.Result → DPRF[MBUF.Rx]                                                                           | Write the result of addition into register Rx                                                                     |

#### Actions for an ADD instruction

| Stage | Actions                                                                           | Comments                                                |  |
|-------|-----------------------------------------------------------------------------------|---------------------------------------------------------|--|
| IF    | MEM[PC] → FBUF.IR                                                                 | Instruction is fetched from memory and placed           |  |
|       | PC + I → PC                                                                       | Considering only the Add                                |  |
| ID/RR | DPRF[FBUF.IR.Ry] → DBUF.A                                                         | instruction, quantify the sizes of                      |  |
|       | DPRF[FBUF.IR.Rz] → DBUF.B<br>FBUF.IR.Opcode → DBUF.Opcode<br>FBUF.IR.Rx → DBUF.Rx | the various buffers between the stages of the pipeline. |  |
|       |                                                                                   |                                                         |  |
|       |                                                                                   | EDI IE2                                                 |  |
| EX    | DBUF.A + DBUF.B -> EBUF.Result                                                    | FBUF? lition  DBUF? le from DBUF to EBUF                |  |
|       | DBUF.OPCODE → EBUF.Opcode  DBUF.R× → EBUF.R×                                      |                                                         |  |
|       | DBOLIX 7 EBOLIX                                                                   |                                                         |  |
| MEM   | EBUF → MBUF                                                                       | MBUF? to MBUF                                           |  |
|       |                                                                                   | - 1 /                                                   |  |
| WB    | MBUF.Result → DPRF[MBUF.Rx]                                                       | Write the result of addition into register Rx           |  |

#### How does this pipeline work for ADD?



#### How does this pipeline work for ADDI?



# How to generalize design of pipeline register?

- Look at each instruction in the ISA
- Decide what needs to be carried over from one stage to the next

Opcode other fields

### Design the DBUF register for LC-2200

- Don't optimize by overloading the different fields of the register
- Decide what fields need to be in DBUF for each instruction in the ISA

| ADD/NAND                                                                      | ADDI                                                                            | BEQ                                                                              | And so on |
|-------------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------------------------------------------------------------------|-----------|
| struct DBUF {    int Opcode:4;    int Rx:4;    int A:32;    int B:32; } DBUF; | struct DBUF {    int Opcode:4;    int Rx:4;    int A:32;    int imm:20; } DBUF; | struct DBUF {    int Opcode:4;    int Rx:4;    int imm:20;    int PC:32; } DBUF; |           |

### Now take the UNION of the requirements

| ADD/NAND                                                                  | ADDI                                                                            | BEQ                                                                              | And so on |
|---------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------------------------------------------------------------------|-----------|
| struct DBUF {   int Opcode:4;   int Rx:4;   int A:32;   int B:32; } DBUF; | struct DBUF {    int Opcode:4;    int Rx:4;    int A:32;    int imm:20; } DBUF; | struct DBUF {    int Opcode:4;    int Rx:4;    int imm:20;    int PC:32; } DBUF; |           |



```
struct DBUF {
  int Opcode:4;
  int Rx:4;
  int A:32;
  int B:32;
  int imm:20;
  int PC:32;
} DBUF;
```

#### Now take the UNION of the requirements

```
struct DBUF {
  int Opcode:4;
  int Rx:4;
  int A:32;
  int B:32;
  int imm:20;
  int PC:32;
} DBUF;
```

- In the ID/RR stage, fill in fields depending on the instruction in FBUF.IR
- Leave unneeded fields unfilled
- What fields would we fill in for
   JALR \$at,\$ra # \$ra ← PC; PC ← \$at

# For the BEQ instruction, the fields in DBUF not filled in...

```
BEQ Rx, Ry, offset ; If (Rx == Ry) PC \leftarrow PC + offset
```

- A. A (contents of Ry)
- B. B (contents of Rx)
- C. PC
- D. Immediate value
- E. Rx specifier

```
struct DBUF {
  int Opcode:4;
  int Rx:4;
  int A:32;
  int B:32;
  int imm:20;
  int PC:32;
} DBUF;
```

